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FIRST ORDER CONSTRAINED OPTIMIZATION ALGORITHMS 
WITH FEASIBILITY UPDATES 

C.H. JEFFREY PANG 


Abstract. We propose first order algorithms for convex optimization prob¬ 
lems where the feasible set is described by a large number of convex inequalities 
that is to be explored by subgradient projections. The first algorithm is an 
adaptation of a subgradient algorithm, and has convergence rate l/\^k. The 
second algorithm has convergence rate 1/k when (1) one has linear metric 
inequality in the feasible set, (2) the objective function is strongly convex, 
differentiable and has Lipschitz gradient, and (3) it is easy to optimize the ob¬ 
jective function on the intersection of two halfspaces. This second algorithm 
generalizes Haugazeau’s algorithm. The third algorithm adapts the second 
algorithm when condition (3) is dropped. We give examples to show that the 
second algorithm performs poorly when the objective function is not strongly 
convex, or when the linear metric inequality is absent. 
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1. Introduction 

Let / : R." —>■ R and fj : R" —>■ R, where j G {1, • ■ ■ ,rn}, be convex functions. 
Let Q C R" be a closed convex set. The problem that we study in this paper is 

min f{x) (1.1) 

s.t. fj{x) < 0 for j G {1,... ,m} 

X G Q. 

If m is large, then it might be difficult for an algorithm to find an x satisfying the 
stated constraints, let alone solve the optimization problem. We now recall material 
relevant with our approach for trying to solve HH). 
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1.1. Projection methods for solving feasibility problems. For finitely many 
closed convex sets Ci,..., Cm in MC, the Set Intersection Problem (SIP) is stated 
as: 

m 

(SIP): Finda;GC:= pjCj, whereC' 7 ^ 0 . (1.2) 

i=i 

The SIP is also referred to as feasibility problems in the literature. When m is 
large, the Method of Alternating Projections (MAP) is a reasonable way to solve 
the SIP. As its name suggests, the MAP finds the sequence by projecting 

onto the Cj cyclically, i.e., Xk+i = Pcy {xk), where k' is the number in {1,..., m} 
such that m divides k — k'. We refer the reader to |BB96[ IBR091 |ER11| . as well as 
[DeuOli Chapter 9] and [BZ051 Subsubsection 4.5.4], for more on the literature of 
using projection methods to solve the SIP. 

The convergence rate of the MAP is linear under the assumption of linear reg¬ 
ularity. The notion was introduced and studied by [Bau96| (Definition 4.2.1, page 
53) in a general setting of a Hilbert space. See also |BB96j (Definition 5.6, page 
40). Recently, it has been studied in [DHOBal iDHOBbl iDHOSj . The connection with 
the stability under perturbation of the sets Cj is investigated in [Kru041 IKru06| 
and other works. 

Another problem closely related to the SIP is the Best Approximation Problem 
(BAP), stated as 

(BAP): min (1.3) 

x^X Z 

m 

s.t. X £ C := Pi Cj. 

1=1 

In other words, the BAP is the problem of finding the projection of xq onto C. The 
BAP follows the template of CH) when/(a:) = ccqIP, fj{x) = d{x,Cj) for each 
j G {1, ■ ■. ,to}, and Q = ]&"■. Dykstra’s algorithm |Dyk83[ TBPSh] is a projection 
algorithm for solving the BAP. It was rediscovered in [Han88] using mathematical 
programming duality. Another algorithm is Haugazeau’s algorithm |Hau68| (see 
[BCllj i. The convergence rate of Dykstra’s algorithm has been analyzed in the 
polyhedral case |DH94l IXu00| , but little is known about the general convergence 
rates of Dykstra’s and Haugazeau’s Algorithms. 

For more on the background and recent developments of the MAP and its vari¬ 
ants, we refer the reader to [BB961 [BR091 lERlT] . as well as [DeuOll Chapter 9] and 
[BZ051 Subsubsection 4.5.4]. 

1.2. First order algorithms and algorithms for (11.11) . First order methods 
in optimization are methods based on function values and gradient evaluations. 
Even though first order methods have a slower rate of convergence than other 
algorithms, the advantage of first order algorithms is that each iteration is easy to 
perform. For large scale problems, algorithms with better complexity require too 
much computational effort to perform each iteration, so first order algorithms can 
be the only practical method. Classical references include [NY83[ INes831 INes841 
INes89] , and newer references include |Nes041 IJNllal UNllb] . See also |BT09) . 

As far as we are aware, the problem (11.11) where projections are used to address 
the feasibility of solutions are studied in [Nedlll IWB15| . In both papers, the 
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approach is to use random projection methods, while the second paper focuses on 
the generalized setting of variational inequalities. 

1.3. Contributions of this paper. In Section^ we modify the subgradient algo¬ 
rithm in [Nes04l Section 3.2.4] for solving (ll.l|l so that the new algorithm is more 
suitable for solving the problem (ED when m is large. When the functions {fj}J^i 
satisfy the linear metric inequality property in Definition 12.41 we show that projec¬ 
tion methods can be used instead. The algorithms in this section have 0(1/Vk) 
convergence rate to the optimal objective value, just like the subgradient algorithm. 

The convergence of projection algorithms for the SIP (11.21) is linear when a linear 
metric inequality condition is satisfied. Furthermore, the convergence of first order 
algorithms for strongly convex functions with Lipschitz gradient to the objective 
value and the unique optimal solution is linear. It is therefore natural to look at 
the convergence rate of (11.11) when 

(1) the functions satisfy linear metric inequality, and 

(2) /(•) is strongly convex, differentiable and has Lipschitz gradient. 

In Sectional we generalize Haugazeau’s algorithm to obtain a first order algorithm 
to solve (ED for the case when (1) and (2) are satisfied, and 

(3) /(•) is structured enough to optimize over the intersection of two halfspaces. 
Our algorithms have a 0(1/k) convergence rate to the optimal objective value and 
0{l/'/k) convergence to the optimizer. We believe that such a convergence rate 
for Haugazeau’s algorithm is new. 

In Sectional we propose a first order algorithm to solve (11.11) when (1) and (2) 
are satisfied, but not (3). The convergence rate to the optimal objective value and 
to the optimizer are slightly worse than the algorithms in Section [T] 

In Section [SI we show that in the case where the dimension and number of 
constraints are large, then a (1/fc) convergence rate is best possible for strongly 
convex problems in a model generalizing Haugazeau’s algorithm, while an arbitarily 
slow convergence rate applies when there is convexity but no strong convexity in 
the objective function. 

In Section [71 we show that the 0(1/k) rate of convergence of Haugazeau’s al¬ 
gorithm to the objective value occurs even for a very simple example. We give a 
second example to show that Haugazeau’s algorithm converges arbitrarily slowly in 
the absence of linear metric inequality. 

2. Preliminaries 

In this section, we recall some results that will be necessary for the understanding 
of this paper. We start with strongly convex functions. 

Definition 2.1. (Strongly convex functions) We say that / : M" — >■ R is strongly 
convex with convexity parameter fi if 

f{y) > f{x) + {f{x),y-x) + ^Wx-yW^ for all x,y e R”. 

Denote the set 5^’^ to be the set of all functions / : R” —>■ R such that /(•) is 
strongly convex with parameter fi and /'{■) is Lipschitz with constant L. 

We recall some standard results and notation on the method of alternating pro¬ 
jections that will be used in the rest of the paper. 
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Lemma 2.2. (Attractive property of projection) Let C C K" be a elosed convex 
set. Then Pc '■ X ^ X is 1-attracting with respect to C: 

\\Pc{x) - xW^ < \\x - 2/lP - \\Pc{x) - 2/|p for all cc G R” and y G C. 

Definition 2.3. (Fejer monotone sequence) Let C C K" be a closed convex set 
and let {xk} be a sequence in R”. We say that {xk} is Fejer monotone with respect 
to C if 

\\xk+i — c|| < \\xk — c|| for all c G C and i = 1,2,... 

Consider the SIP (11.21) and the method of alternating projections described 
shortly after. The 1-attractiveness property leads to the Fejer monotonicity of the 
sequence {xk}'^i with respect to C = The Fejer monotonicity property 

will be used in the proof of Theorem 13.51 

A stability property that guarantees the linear convergence of the MAP is defined 
below. 

Definition 2.4. (Linear metric inequality) Let fj : R" —?■ R. be convex functions 
for j G {1,... ,m}. Let C := {x : maxi<j<m /j(x) < 0}. Let x G R". If /j(x) > 0, 
then choose any gj G dfj{x) and let the halfspace Hj be 

Hj ■■= {y ■■ fj{x) + {gj,y - x) < 0}. 

Otherwise, let Hj = R". We say that {fj{-)}jLi satisfies linear metric inequality 
with parameter k > 0 if 

d{x,C) < K max d{x,Hj) for all x G R". (2.1) 

l^j^m 

In the case where fj{x) = d{x, Cj) for some closed convex set Cj, then dfj{x) = 
and \\x—Pcj (x)|| = d{x, Cj) (This fact is well known. See for example 

[BClll Proposition 18.22].). So d{x,Hj) = d{x,Cj), and (|2.1I) reduces to the well 
known linear metric inequality (which is sometimes referred to as linear regularity) 
for collections of convex sets. A local version of the linear metric inequality is often 
defined for the local convergence of projection algorithms. But for this paper, we 
shall use the global version defined above to simplify our analysis. 

2.1. Using quadratic programming to accelerate projection algorithms. 

One way to accelerate projection algorithms for solving the SIP (11.21) is to collect the 
halfspaces produced by the projection process and use a quadratic program (QP) 
to project onto the intersection of these halfspaces. See |Panl5| for more on this 
acceleration. The material in this subsection can be skipped in understanding the 
main contributions of the paper. But we feel that a brief mention of this acceleration 
can be useful because it shows how developments in projection methods for solving 
the SIP can be incorporated in the algorithms of this paper. 

A QP can be written as 

min ^Wx-XoW^ 

s.t. X G 

where Hi are halfspaces. If m is small, then the optimal solution can be found 
with an efficient QP algorithm once the QR factorization of the normals of Hi are 
obtained. 


x-PCj [x) 
\\x-PcAx)\\ 
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If m is large, then trying to solve the QP would defeat the purpose of using first 
order algorithms. We suggest using the dual active set QP algorithm of |GI83| . 
The fcth iteration updates a solution Xk and an active set Sk C {1,... ,to} such 
that Xk & dHi for all i G Sk and Xk = Pni^Sf^Hiixo). The algorithm of [GI83] has 
two advantages: 

(1) Each iteration involves relatively cheap updates of the QR factorization of 
the normals of the active constraints and solving at most linear systems 
of size at most \Sk\- 


(2) The distance d{xo, = ||a;o — a^fell is strictly increasing till it reaches 

d{xo,n^=^Hi). 


So if the QP were not solved to optimality, each iteration gives a halfspace Hk = 
{x : {xo-Xk,x-Xk) < 0} such that Hk D and d{xo,Hk) = d(a;o, 

which is strictly increasing by property (2). The size of the active set |S'fc| can 
reduced if some of the halfspaces are aggregated into a single halfspace, just like in 
the generalized Haugazeau’s algorithm in Section [H 

To accelerate an alternating projection strategy, the QR factorization of the 
normals of the halfspaces containing x, (the point where one projects from) can be 
used to find a separating halfspace that is further away from x at the cost of an 
iteration of the algorithm in [GI83| . 

3. A SUBGRADIENT ALGORITHM FOR CONSTRAINED OPTIMIZATION 

In this section, we look at how to adapt [NesOdl Theorem 3.2.3] to treat the case 
where the number of constraints is large. We begin by describing our algorithm. 

Algorithm 3.1. (Subgradient algorithm with feasibility updates) Let f : R" —>■ K 
and fj : K” —>■ K (where j G {1,. ■. ,m}) be convex functions. Let Q C R" be a 
closed convex set, and R > 0 be such that ||x — y\\ < R for all x,y G Q. Let 


Cj := {x:fj{x)<0} 

and C := {x :/j(x) < 0, j = 1,..., to} = 


(3.1) 


This algorithm seeks to solve 

min{/(x) : x G Q,fj{x) < 0,j = 1 ,...,to}. 
01 Step 0. Choose xq G Q and sequence 


(3.2) 



02 Step 1: kth iteration (k>Q). Use either Step lA or Step IB to find Xk+i: 
03 Step lA. (Supporting halfspace from Xk). 

04 Find gj^k G dfj{xk) for all j G {1, ..., to}. 

05 Define the halfspace Hj^k by 



06 If d{xk, Hj^k) < hk, then find gk G df(xk) and set 



(3.3) 


07 Otherwise there is a halfspace Hk 
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08 such that C Hk and d(xk,Hk) > hk- Set 

Xk+i = Pqo PH^ixk). 


09 Step IB. (Alternating projection strategy) 

10 Let x). = Xk. 

11 For j = {1,..., to} 

12 Find gj^k £ 9fj(xi~^). 

13 Define the halfspace Ftj^k by 

j{x: f{xl~^) + {gj^k,x-xl~^) <0} 


iffixl ^) > 0 
otherwise. 


(3.4) 


14 Set Sj^k to be a subset of {1, •.., j} such that j € Sj^k- 

15 Setxi = Pnt^sJ^^Hkk(xi~^)■ 

16 End For. 

17 If at any point X)Li ll^fe “ > hi, then set Xk+i = Pq(xI). 

18 Otherwise, choose gk £ df{x^) and set 

(3.5) 

We make a few remarks about Algorithm 13.11 Algorithm 13.11 is adapted from 
[Nes041 Theorem 3.2.3] so that if to is large, then one may only need to evaluate a 
few of fjixk) and gj^k, where j £ {1,... ,to}, in the /cth iteration to find Xk+i- 

Remark 3.2. (Using quadratic programming to accelerate projection algorithms) 
The set Sj^k in Step IB can be chosen to be Sj = {}}, and Step IB would correspond 
to an alternating projection strategy. But if the size |<5'j,/c| is small, then each step 
can still be carried out quickly. Depending on the orientation of the sets Cj (see 
(EH)), choosing a larger set Sj^k can accelerate the convergence of the algorithm 
as the intersection of more than one of the halfspaces Hj k would be a better 
approximate of the set C than a set of the form Cj. The strategies outlined in 
Subsection [Q can be applied. 


In order to accelerate convergence, we can take Xk+i = Pni^s^^i ^ ° ^QiVk) in 
(13.311 and (I3.5I1 . where Sk C {1,..., to} and yk is the formula in Pq(.). A halfspace 
separating Pgipk) from can be found with the strategies in Subsection 

o 


Remark 3.3. (Choices of Xk+i in Step lA) In Step lA of Algorithm l3.1l it is possible 
that m.a,xi<j<:m d{xk, Hj^k) < hk and there is a halfspace Hk such that CTfL^Hj^k C 
Hk and d{xk,Hk) > hk. The halfspace Hk satisfying the required properties can 
be found (by the strategies outlined in Subsection 12.11 for example) before all the 
distances d{xk, Hj^k), where j £ {!,..., to}, are evaluated, so one would carry out 
the step (13.411 in such a case. 

Remark 3.4. (Order of evaluating fj{-)s) In both Steps lA and IB, we do not 
have to loop through the functions {/j(-)}^i in the sequential order. The func¬ 
tions {fj{-)}JSi can be handled in any order that goes through all the indices in 
{!,..., to}. If j £ {!,..., to} is such that fj{x*) < 0 for all optimal solutions x*, 
then fj{-) shall be evaluated infrequently. One can also incorporate ideas in |HC08| 
to find a good order to cycle through the indices {!,..., to}. 
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Step IB describes an extended alternating projection procedure to find a point 
that is close to C. In view of studies in alternating projections, one is more likely to 
achieve feasibility by projecting from the most recently evaluated point instead 
of Xfc. 

Theorem 3.5. (Convergence of Alaorithm \3.1\) Consider Alaorithm \‘J.l[ Let x* be 
some optimal solution. Let /(•) be Lipschitz continuous on B{x*,R) with constant 
Ml and let M 2 be 

M 2 = max {||g|| : g € dfj{x),x € B{x*,R)}. 

l<j<m 

(a) If Step lA was carried throughout, then for any k>3, there exists a number 
i', 0 < i' < k such that 

f{xi')-f* < and max{fj{xi>) : j £ m}}< ■^^=. (3.6) 

(b) Recall the definition ofC in (13.11) . If Step IB was carried throughout, Q = ffi." 
and the linear metric inequality condition is satisfied for some constant k < 00 , then 
there exists a number i', 0 <i' < k such that 


/(^p ) - /* < and d{x^,C) < 

" y/k-1.5 ^ ^ “ y/k-1.5 

Proof. We first prove for Step lA. Let k' = [k/3\ and 

4 = G [k', ...,k] : Xk+i = Pq |- 


(3.7) 


c*|P < Ikfe - a;*|p 

the projection operation. When i £ Ik, we have 


When i ^ R, we have ||a;fe+i — a;*|p < \\xk — a;*|P — h^ from the 1-attractiveness of 


\Xi+i-X*\\‘^ < 


r hi 


1 

— X 


(3.8) 


< \\x,-x*r+ hi-2R 


Il5.ll 


,Xi - X 


Summing up these inequalities for i G [k',... ,k] gives 


\\Xk+l - X*\\'^ < Wxy - X*f‘ - ^ 

ieik 


2hi 




,Xk-x*) - k 


-E'-t 

i^Ik 


Let Vi = Xi — X*). Seeking a contradiction, assume that vt > hi for all i £ Ik. 

Then 

k 

^ + X 

i£lk 


i—k 


i^[k' 


which gives 

fe k .. 


i—k' 


i—k' 




T 0.5 


2 k' + I 


This is a contradiction. Thus Ik % and there exists some i' £ R such that 
Vi' < hi'. Clearly, for this number we have Vi' < hk'. Lemma 3.2.1 in [NesOd] shows 
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that f{xi') — f{x*) < Ml max{0, So 

fix,,)-fix*)<Mihk', (3.9) 

which implies the first part of (j3.6|l . 

We now prove the second part of (13.611 . Since i' G Ik, we have d{xi' < hi' 

for all j G m}. We can calculate that d{xii, Hjy) = Therefore, 

^ hi', which gives fj{xi') < < M 2 hk'■ It remains to note that 

A:' > I — 1, and therefore hk' < ^ ■ This ends the proof of (a). 

We now go on to prove the result if Step IB had been used throughout the 
algorithm. Once again, let k' = [/c/3j and 

4 = G [k', ...,k]: Xk+i = Pq (^x^^ - |- 

If z ^ 4, we still have \\xi+i — x*||^ < \\xi — a;*|p — /if. If i G 4, we have 
11 x™ — a;* II < ||xi — x*||, and we can use arguments similar to (13.81) to get 

||cr,+i - < ||xr - + /if - 2/1, xT - x^^ 

Define u, = ~ the same reasoning as before, there is some i' G Ik 

such that Vi' < hi' < hk' ■ By the reasoning in (13.91) , we have 

f{x^)-f{x*)<M^hk'. 

To obtain the other inequality, we note that d{xjr^,Cj) < ||x'^ — for any 

j G {!,..., ni}. Thus for any j G {!,..., to}, we have 


d{xi',Cj) < ^ ||x', -x[, ^11 < y/m^ 


1^1 


A < y/mh'. 

\ '=1 


In view of linear metric inequality, we thus have 

d{xi',C)<K max d{xi',Cj) < K\/rnhi'. 

jG{l,....m} 

By Fejer monotonicity, we have d{x^,C) < Ky/mhi'. Like before, hi' < hk' < 
Jk ^^.5 • is complete. □ 

4. Convergence rate of generalized Haugazeau’s algorithm 


One method of solving the BAP (II.3p is Haugazeau’s algorithm. In this section, 
we show that a generalized Haugazeau’s algorithm has 0{l/k) convergence to the 
optimal value and 0 (1/y/k) convergence to the optimal solution when the linear 
metric inequality assumption is satisfied. 

Algorithm 4.1. (Generalized Haugazeau’s algorithm) Let f : K." — >■ R 6e in 5^’^, 
where ^ > 0. For a point xq and several continuous convex functions fj : R" —>■ R, 
where j G {I,..., to}, we want to find the minimizer of /(•) on 

C:=n™i{x:/,(x)<0}. 

Suppose the linear metric inequality assumption is satisfied. 

(A choice of f{-) is f{x) := ^||x — xo||^, where xq is some point in R”.} 

01 Step 0: Let Hq = R". 
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02 Let xo be the minimizer of /(•) on K". 

03 For iteration fc = 0,1, 2,... 

04 Step 1 (Find a half space of largest distance from Xk): 

05 For j G {1,..., to}, 

06 Find gj^u G dfj{xk) 

07 Let Hj^k be the set 


Hj,k 


{x : fj{xk) + {gj^k,x- Xk) < 0 } 
R” 


if f 3 iFk) > 0 

otherwise. 


08 Let j G {1,..., to} be such that d{xk, Ftj = maxj d{xk, Hj^k)- 
09 Let H+H-j i,. 

10 end for 

11 Step 2: 

12 Find the minimizer Xk+i of f{-) on Ltf. fl H(( 

13 Let = {x : {-f'{xk+i),x - Xk+i) < 0}. 

14 End for 


The halfspace in Step 2 is designed so that Xk+i is the minimizer of /(•) 

on Finding the index j such that d(xk, = maxj d{xk, Ftj^k) in Step 1 

can be prohibitively expensive if to is large, so the alternative algorithm below is 
more reasonable. 


Algorithm 4.2. (Alternative algorithm) For the same setting as Algorithm EH 
we propose a different algorithm. 

01 Step 0: Let Hq be R", and let xq be the minimizer of f(-) on M". 

02 Let k = 0. 

03 Step 1: Set x^ = Xk and f. = FFf.. 

04 For j = 1,... ,m 

05 Find gj^k G dfji^k) ond set 

JJ+ i{x-f3ixl~^) + {gj,k,x-xl~^) <0} iffjixl~^)>0 
j,k otherwise. 

06 Find the minimizer of f{-) on L[°_^ ^ fl 

07 Let Hl^ = {x : {-f'{xl),x- x^) < 0}. 

08 end for 

09 Step 2: Set Xk+i = x^ 

10 Set fc ■(— fc + 1 and go back to Step 1. 

Remark 4.3. (Quadratic case in Algorithm l4.ll) We discuss the particular case when 
f{x) := i||a; — Xolp. In other words, the optimization problem is the BAP p.3ll . 
In this case, Algorithm 14.11 reduces to Haugazeau’s algorithm. The problem of 
minimizing /(•) on the intersection of two halfspaces is easy enough to solve ana¬ 
lytically. Note that throughout Algorithm 14.11 the halfspaces Hf and H() contain 
the set C. One can choose to keep more halfspaces containing C and in Step 2, 
find the minimizer of /(•) on the intersection of a larger number of halfspaces. The 
convergence would be accelerated at the price of solving larger quadratic programs. 
One can also apply the strategies in Subsection l2.ll 

The lemma below is useful in the proof of Theorem 14.81 
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Lemma 4.4. (Convergence rate of a sequence) Suppose {(5fc}fc C K zs a sequence 
of nonnegative real numbers satisfying 

Sk+i < Sk — Cl l — \/l — e2Sk +^5 

where ei,e 2 >0, a > 0 and €261 < 1. Let e = 

(a) The convergence of {Sk}k to zero is 0 {l/k). 

(b) If a = 0 and Sk > 0 for all k, then {(5fc}fc is strictly decreasing, and 

6 k < ^ for all k > 0 , 

Proof. We first prove (a). Suppose the values r > 0 and e > 0 are small enough 
so that Si < ^, e < e , Si < -^ and r^d + r < e. Suppose Sk < ^. Then by the 
monotonicity of the function 5 1 —>■ <5 — eS^ in the range S G [0, , we have 


Sk+i < 


< 


< 


< 


Sk — 

Sk — + 


— e- 


rk 

rk — e + r^a 
rk — r 


a 

P 

d 

¥ 

d 


v 


'k^ 


< 


r{k + 1 )' 


Thus {Sk}k G 0 {l/k) as needed. 

We now prove (b). Like in (a), we have Sk+i < Sk — j^ie'^S'l = Sk — eSf.. It is clear 
that {Sk}k is a strictly decreasing sequence if all terms are positive. Let dk = -^ 
so that Sk = ^. Then 


-c9 1 1 6»fc - 1 1 

4+1 <Sk-eS, = —-j^^= ^ 


1) i 


In other words, ^ The conclusion is now straightforward. 


□ 


Lemma 4.5. (Distance to supporting halfspace) Suppose f : K" —>■ M zs a dif¬ 
ferentiable strongly convex function with parameter p. Let x,x € M" be such that 
f{x) < f{x) and f'{x) ^ 0. Define the halfspace H by H := {x : {f {x),x — x) > 0}. 
Then the following hold: 


(a) d{x,H) > i ||/'(x)|| - v/||/'(a;)P - 2p[f{x) - f{x)] 


(b) If {fix), x-x) > 0 , then ||x - x|| < ijj^[f{x) - f{x)]. 


Proof. We first prove (a). We look to solve 

. / -fix) -\ 

s.t. fif + {f{x),y-x) + |||j/-xf < f{x) 
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For any y S R", a lower bound on f{y) is f{x) + {f'{x),y — x) + ^\\y — a;||^ by 
strong convexity. Thus if y is such that /(y) = f(x), it must satisfy the constraint 
of the above problem. The objective value is d{y, H). So this optimization problem 
finds a lower bound to the distance to the halfspace H provided that the objective 
value is at most f{x). 

We rewrite the constraint to get 


fix) + {f{x),y -x) + if\\y-x\\^ 


y-x + - fix) 


< fix) 

< fix) - fif + i^Wfiff- 


The feasible set of the optimization problem is thus a ball with center x — j^f{x). 
The optimization problem can be solved analytically by finding the t with smallest 
absolute value such that x = x + tf'{x) lies on the boundary of the ball. In other 
words, 

fix) + {fix),[x + tf{x)]-f + i^\\[x + tf{f]-f\'^ = f{x) 

^ffj,\\fix)f+t\\fiff + fix)-fix) = 0. 

So 


, -\\fix)f + \\fif\\yWmF^^Mm^^] 

d\\fix)P 

1 f\\^{x)\\^-2^^[f{x)-f{x)] 

The distance of x to is thus at least ^ [||/'(x)|| — \/||/'(x)P — 2 y[f{x) — /(x)]J 
as needed, which concludes the proof of (a). 

Next, we prove (b). By strong convexity and the given assumption, we have 

fix) > fix) + {f{x),x- x) + ^llx- xf > /(x) + |||x - xf. 

A rearrangement of the above gives the conclusion we need. □ 

Before we prove Theorem 14.81 we need the following definition. 


Definition 4.6. (Triangular property) Consider the function fj : R" —>■ R in 
Algorithm [121 for some j G {!,..., m}. We say that fj : R" —>■ R has the triangular 
property if for and y,z G R" and any py G dfj{y) and pz G dfj{z), we have 

diy,Hy) <\\y-z\\+d{z,Hz), (4.1) 


where 


{x ■■ fjiy) + {gy,x-y) < 0} 

R" 


and Hz is defined similarly. 


if fjiy) > 0 
if fjiy) < 0, 


If fj : R” —>• R is defined by fjix) = d{x, Cj) for a closed convex set Cj C R", 
then py = = '^iv^^j) fz^Hz) = d{z,Cj), 

SO (SID obviously holds. However, the triangular property need not hold for any 
convex function. 
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Example 4.7. (Failure of triangular property) Let fj : R —>■ R be defined by 
fj{x) = max{a;, 2x — 1}. It y = 0.9 and z = 1.1, we can check that d{y, Hy) = 0.9, 
(i(z, Hz) = 0.6 and \\y — z\\ = 0.2, which means that (14.11) cannot hold. 

We now prove the convergence of Algorithms 14.11 and 14.21 


Theorem 4.8. (Convergence rate of Algorithm 1^.i| ) Consider the setting in Algo¬ 
rithm EH Suppose the linear metric inequality is satisfied. Let x* he the optimal 
solution to min{/(x) : x G C}, and assume that f'{x*) ^ 0. 

(a) In Algorithm ic. 1[ the convergence of {f(xk)}'^^i to f{x*) satisfies 

f{x*) - f{xk) < -r— 


f{x*)-f{xo) 


ek' 


where e = 2 K^\\f{x‘‘)\\-^ ’ convergence of {xk}'^^i to x* satisfies 


k* - Xk\\ < \l-[fix*) - f{xk)]. 


(4.2) 


Thus the convergence of {f{xk)}’^i to f{x*) is 0{\/k), and the convergence of 
to X* is 0(1/Vk). 

(b) Suppose in addition that the triangular property holds. In Algorithm ic. the 
convergence of {f{xk)}'^-i to f{x*) satisfies 

fix*) - fixk) < 1 ^ 

f{x*)-f(xo) rn 

where e is the same as in (a), and the convergence of {xk}^i to x* satisfies (14.2|) . 
Proof. We first prove part (a). Consider the halfspace 

H* := {x : {-fix*), x - x*) < 0}. 

The halfspace H* contains C, and contains x* on its boundary. It is clear that 
{/(cc/c)}^! is an increasing sequence such that limfe_>oo fixk) = fix*). 

By Lemma l4.5f aL we have 


(4.3) 


dixk,H*) > ^\\fix*)\\ - f\\fix*)\\^-2p[f{xf-f{xk)] . 

/i L J 

By linear metric inequality, we can find a separating halfspace from Xk to C that 
is of distance ■^[\\fix*)\\- y/\\fix*)\\'^ - 2p[f{x*) - /(xfc)]] from Xk- Thus 


1 


ll/'(:^*)ll - s/\\f'ix*)\\^-2y[f{x*)-fixk)\ 

OCU , , ± 1 ^ 

^-strong convexity of /, we have 


\\Xk+l - Xk\\ > — 

KpL L 

The next iterate Xk+i lies in the set Hf. n H(( , so {f'{xk),Xk+i — Xk) > 0. By the 


fixk+i) > fixk) + {fixk),Xk+i-Xk)P^\\xk+i-Xk\\‘^ (4.4) 


> fiXk) + ^\\xk+l-Xkf 


fixk+i)-fixk) > 


2 n?p, . 


Ilf (^*)ll - 


Let 5k = fix*) — fixk). From the above, we have 


Jk+l 


<5k-ei 


1 — \/l — e2(5fe 
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where ei = and £2 = \\f'(x‘‘)\\‘^ ■ ^PPlyi'^S Lemmagives the first part 

of our result. Next, note that x* lies in the halfspace through Xk with outward 
normal —f'{xk), so this gives {f'{xk),x* — Xk) > 0. We use Lemma H^b) to get 

We now prove part (b). Like before, Lemma H^a) applies to give (14.31) . By 
linear metric inequality with parameter k, there is an index j S {1, • ■ ■ ,m} such 
that for any s G dfj{xk), the halfspace Hj := {x : fj{xk) + {s,x — Xk) < 0} is such 
that d{xk, Hj) > d, where 


d:= 


1 

K/i 




2fi[f{x*) - f{xk)] 


(Note the difference between Hj and H^k-) 

Since xjT^ minimizes /(•) on H°j, and x^. G we have (f'ixi~^),xi — xi~^) > 
0. Therefore, just like in (14.41) . we have 


/( 4 ) - /(4 ^ (/'(4 ^ 


),xl-x- 


r')+f I 




> 




The triangular property implies that ) + \Wk-^-xl\\>d{xlH.)>d. 

Therefore, 


Then 


El 

i=i 


4-xr'ii > 


= d{xl \H+^) + \\xl ^ -xl 


> d. 


f{xi)-fixl) > 




f=i 


> 




1 2 


El 




> Hd^ > JLrf 2 , 

2j 2m 


Let 4 := fix*) - fixk). We have fixk+i) - fixk) > f{xl) - f{xl) > so 


4+1 < 4 - 

2m 


= dfc — 

< 4 — 


1 


2Kffim - 
\\f(.x*)\\ 

2 K^/iTO 


ll/'(^*)ll - V\\f'{x*)r-Mfix*)-f{xk)] 


■1 2 


1-wi- 


2fi 


\\f'ix*)\\ 


Applying Lemma ITl^ bl gives us the result we need. Lemma [4. Si b) still applies to 
give (1321) ■ □ 


One would expect Algorithm 14.21 to be better than Algorithm 14.II and converge 
faster than the conservative estimate of the convergence rate in Theorem 14.81 
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5. Constrained optimization with strongly convex objective 


Consider the strategy of using Algorithm 14.11 to solve (11.111 . where / G 
and {fj{-)}JLi satisfies the linear metric inequality. A difficulty of Algorithm 14.11 
is in Steps 0 and 2, where one has to minimize /(•) over the intersection of two 
halfspaces. A natural question to ask is whether an approximate minimizer would 
suffice, and how much effort is needed to calculate this approximate solution. In 
this section, we show how to get around this difficulty by using steepest descent 
steps to find an approximate minimizer of /(•) on the intersection of two halfspaces, 
leading to an algorithm that has a convergence rate comparable to Algorithm 14.11 

We first recall the constrained steepest descent of functions in 5^’^ constrained 
over a simple set and recall its convergence properties to the minimizer. 

Algorithm 5.1. (Constrained gradient algorithm) Consider f : M" — >■ M in 5^’^ 
and a closed convex set Q C K". Choose Xq G Q. The constrained gradient algo¬ 
rithm to solve 

mm{f{x) : X G Q} 

runs as follows: 

At iteration k (where k > 0^, Xk+i = Xk — hPg^Xk — j;f'(xk))- 

Associated with the steepest descent algorithm is the following result. See for 
example [NesOdl Theorem 2.2.8]. 


Theorem 5.2. (Linear convergence to optimizer of gradient algorithm) Consider 
Alaorithm \5. 1[ Let x* be the minimizer. If h = 1/L, then 

\\xk+i - < (l - \\xk - . 


Actually, the optimal algorithm of |Nes04| gives a better ratio of (1 — in 
place of (1 — -j), but the ratio (1 — is sufficient for our purposes. In problems 
whose main difficulty is in handling a large number of constraints rather than 
the dimension of the problem, algorithms which converge faster than first order 
algorithms can be used instead. A different choice of algorithm would however not 
affect our subsequent analysis. 

We now state our algorithm. 


Algorithm 5.3. (Constrained optimization with objective in S^^\) Consider f : 
R" —>■ R. in and let fj : R." —>■ R, where j G {1, . •. ,ni}, be linearly regular 

convex functions. 

01 Separating halfspace procedure: 

02 For a point x G R", a separating halfspace is found as follows: 

03 For j G {1, ..., to}, 

0) Find some gj G dfj{x) 

05 Let 


Hj:= 


{x' : fj{x) + {gj,x' - x) < 0} 
R’" 


if fj{x) > 0 
otherwise. 


06 end for 

07 Let = Hj, where j = argmaxi<j<m d{x,Hj). 


01 Main Algorithm: 
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02 Let Hi = K" and = R" and let a;° be a starting iterate. Let a > 0. 

03 For k = 1,... 

04 Let xl be the minimizer of /(•) on Hf. fl 

05 Starting from x^, perform constrained gradient iterations (Alaorithm \5. 1\) 
06 for solving min{/(a:) : x S Hf, fl H'lf } to find Xk such that 
07 (1) \\xk - xlW < -§ 2 , and 

08 (2) d{xk, H^_^i) > 2\\xk — x*ff\\, where H^_ii is a halfspace obtained 

09 from the separating halfspace procedure with input Xk- 

10 If Hf. ^ R" and ^ R” (i.e., both Hf. and are proper halfspaces) 

11 Combine halfspaces and to form one half space H^^i: 

12 If dHl n dH+ = 0 or d{xk,dH^ n dH+) > ^ 

13 LetH°^i=H+ 

14 else 

15 Project —f'{xk) onto cone({n^, n^}) to get v G R", 

16 where and are the outward normal vectors of Hf and . 

17 Project Xk onto dH^ fl OH)) to get Xk ■ 

18 Let := {x : {v,x — Xk) < 0} 

19 end if 

20 else 

21 Let Hl^i = H+ 

22 end if 

23 end for 

Algorithm l5.3l is actually a two stage process. We refer to the iterations of finding 
{xk}, {H^} and {H^} as the outer iterations, and the iterations of the constrained 
steepest descent algorithm to find Xk as the inner iterations. 

We didn’t mention the starting iterate for the constrained steepest descent 
algorithm. We can let = Xk-i for fc > 1, but setting x^ = xl is sufficient for our 
analysis. 

Throughout the algorithm, the points x). are not found explicitly. The distance 
liccfe — aijll can be estimated from Theorem 15.21 

We make a few remarks about Algorithm l5.3l At the beginning of the algorithm, 
the sets Hi and H^ equal R", but after some point, they become proper halfspaces. 
It is clear from the construction of Hf._^_i that Hf. fl H)) C Hf._^_i, and that H^ are 
designed so that C C H)f, so C C Hf. 

Assume that H^ and H)) are proper halfspaces. Then the sets dHf. and dH)) 
are affine spaces with codimension 1. In order for dH^ fl dH'^ = 0, the normals of 
the halfspaces have to be in the same direction. The condition 

d{xk, dHl n dH+) > ^> \\xk - 411 

implies that x). cannot be on dHl fl dH^^ , so xl has to lie only on either dHl or 
dH)), but not both. By the workings of Algorithm 15.31 xl cannot lie on dHl, and 
must lie on dH^. This explains why Hl_ii = H^ in the situations specified. 

Theorem 5.4. (Convergence of Alaorithm \5.3\] Consider Alaorithm \5.3[ We have 

( 1 ) {/{^*)-/(4)}fcieO(i/fc)- 

(2) {\\xl-x*\\}&0{ll^). 

(3) {f{x*) - f(xk)} e 0(l/k), and {\\xk - a;*||} G 0(l/v^). 
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Proof. From strong convexity, we have 

/(4+i) - fixi) - ^114+1 - 4f > (/'(4),4+i - 4)- (5.1) 

Recall that and are the outward normals of the halfspaces and 
respectively. The optimality conditions imply that —f'{x*ff) € cone({n^,n^}). 

When fc = 1 or 2, the halfspace equals K", so equals . In the 

case when d{xk,dH'^ fl dH^) > we also have ~ these cases, 

= {x : (/'(xj), X — x*jf) > 0}. The inequality (15.11) reduces to 

/(Xfe+i) - f[xl) - |||x4i - Xfef > 0. (5.2) 

We now address the other case where Hf and are both proper halfspaces 
and d{xk,dH^ n dH+) < Since v = Rcone({n“.n+})(-f(3^fe)) and -f{xl) = 
4:one({n“ rt+})(~/^(^fe))i nonexpansivity of the projection onto the convex set 
cone({n^, nji"}) and the assumption that f'{-) is Lipschitz with parameter L gives 
us 

11/(4) - (-411 < 11/(4) - n^k)\\ < L\\xl - Xfell. (5.3) 

The halfspace equals {x : {v,x — Xk) < 0}. We must have 

which gives 

(-'y,x4i - Xfc) > 0. (5.4) 

Before we prove (15.61) . we note that 

\\xk - Xk\\ < d{xk,dHf.ndH^) < (5.5) 

and that \\xk — x^|| < .^ is a requirement in Algorithm 15.31 We continue with the 
arithmetic in ISH) to get 

/(x4i) - f{xl) - |||x4i - xlf (5.6) 

> (/(4))4+i-4) 

= (-W, 4+1 - Xk) + (-P, Xk - Xl) + {f'{xl) + V, Xfc+i - xl) 

> 0-||w||||xfc-Xfell - ||/(xfe)+w||||x4i-Xfell (by dnH)) 

> -l|u||[|4fe - 411 + \W - xk\\] - L\\xk - 4IIII4+1 - 411 (by (E31)) 

> -^[2\\v\\+L\\xU,-xl\\]{hym)- 

Next, since v is the projection of f'{xk) onto cone({n^, n^}), which is a convex set 
containing the origin, we have ||z;|| < \\f'{xk)\\. Note that \\xk — xj|| < f ^ < a. So 

ll/l < 11/(^411 

< 11/(4)11+ ll/(^4-/(4)ll 

< ||/(x^)||+L||x,-x*,|| 

< max{||/(x)|| :/(x) </(x*)}+ La. 

Since /(x^) < /(x*) and < /(x*), the strong convexity of /(•) implies that 

x^ and x^i both lie in a bounded set. (See Lemma 15.51 1 Therefore, there is a 
constant d > 0 such that a[L||x^_|_;^ — ^fell + 2||x||] < a. Continuing from (15.61) . we 
have 

/(x4i) - /(Xfe) - |||Xfe+i - x*kf > 


( 5 . 7 ) 
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Since G we have ||a;^_|_i — a;^|| > We now prove that 

> -^d{xl,C). We get ^d{xk,C) < d{xk,H^_^_l) from linear met¬ 
ric inequality. From 2\\xk — x^W < d{xk, and the triangular inequality 

d{xk,H+^^) < \\xk-xl\\+d{xl,H+^^), we have \\xk-xl\\ < d{xl,H+^^). Together 
with the fact that k > 1, we have 


1 


d{xlC) < 


< 

< 

< 


-\\xk - xlW + -d{xk,C) 

K K 

\\xk-xl\\ +d{xk,H+^^) 

2\\xk-xl\\+d{xl,H+^^) 

M{xl,H+^,). 


We then have 

114+1 -411 


> d(4,i^fc++i) 

^44: C") 


(5.8) 


> 


> 


1 

- 


wf^x^w - j\\r(x*w -2^,[f{x*)- f{xi)] 


(The third inequality comes from Lemma l4.5l' al.l 

Let ei = and 62 = \\f'\x-‘)\\ ■ Putting (j5.8H into formulas (15.211 and (15.711 

gives 

2 


f{x*) - /(4+i) < f{x*) - fixl) - 


/iei 


1 - - £2[/(a;*) - fixl)] 


a 


Ofc+i < Ofc — 


1 2 


1-41^^ 


a 


where SI := f{x*) — f{xl). Part (1) now follows from Lemma [HTf al. 

The optimality conditions on ccj implies that the point x* must lie in the halfs¬ 
pace {x : {f'{xl),x — x'l) > 0}. Lemma HlSl bl then implies 


b* - 411 < Y-1/4*) “ /(4)]- 

The claim in (2) follows immediately from the above inequality and (1). 

To see (3), note that since /(•) is locally Lipschitz at x*, we have 

,. \f{xk)-f{xl)\ 

hmsup—r— < 00. 

fc—)-oo \\Xk ^k\\ 

Since {\\xk — a^fcH} G 0(l/fc^), we have {\f{xk) — /(a^J)!} G 0(l/fc^). Next, since 
{\fix*) — f{xl)\} G 0{l/k), we have {|/(ai*) — fixk)\} G 0{l/k) as needed. The 
other inequality {\\xk — a;*||} G 0{1/Vk) can also be proved with these steps. □ 

Lemma 5.5. (Estimate of x(.) In Alaorithm \5.tA the points x^. satisfy 


4 - + -fi^l 

< 

-fi^l 





Proof. From the ^-strong convexity of /(•) and the fact that /(x^) < f{x*), we 
have /(x*) -I- (/'(x*),x^ — x*) -I- f||x^ — x*|p < /(x^) < fix*), from which the 


conclusion follows. 


□ 
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5.1. Computational effort of Algorithm 15.31 We now calculate the amount 
of computational effort that Algorithm 15.31 takes to find an iterate Xk such that 
\f(xk) — fix*)\ < e. The number of outer iterations needed to hnd the iterate Xk 
is, by dehnition, k. It therefore remains to calculate the number of inner iterations 
corresponding to each outer iteration. 

Consider the case when ||a;J — a:*|| is small (or even zero) for the final iteration k. 
Even though it means that the outer iterations in Algorithm 15.31 have done well to 
allow us to get a good x’^ once the required number of inner iterations are performed, 
the number of inner iterations needed to satisfy d{xk, > 2\\xk — a;^|| can 

be excessively large. In view of this difficulty, we leave out the number of inner 
iterations associated with the last outer iterate. Nevertheless, when d{xk, 
and 11 Xfc — 11 are small, we have the following estimates on f{x*) — f{xl), 11 a; J — x* 11, 

and hence \f{x*) — f{xk)\ and \\xk — a;*|| in (l5.1Uall of Theorem 15.61 from quantities 
that are calculated throughout Algorithm 15.31 


Theorem 5.6. (Performance estimates) Consider Alaorithm \5.d[ Let H* be the 
halfspace {x : {f'{x*),x — x*) > 0}. We have 

(1) 0<f{x*)-f{xl)<\\f{x*)\\d{xlH*). 

(2) 114 - x*|| < ^2\\nx*)\\d{xl,H*). 

Suppose {fji-)}fLi satisfies k linear metric inequality and an iterate Xk of the 
minimization subproblem is such that 


d := [\\xk - 411 + Kd{xk,H+^^)]. 


Then 


d{xlH*)<d. (5.9) 

Hence if f{-) is Lipschitz with constant M in a neighborhood U of x* and both Xk 
and x\ lie in U, then 


\f{xk) - f{x*)\ < \\f{x*)\\d + M\\xk-x ll (5.10a) 

and \\xk - x*\\ < \\xk - xl\\ + ^J2f^\\f'{x*)\\d. (5.10b) 


Proof. Recall that x* is the solution to the original problem. Since f{xl) < f{x*), 
then either d{x).,H*) > 0 oi xl = x*. When xl = x*, all the conclusions in our 
result would be true, so we only look at the first case. It is clear that d{x’l, H*) = 

(— ll/'(^*)ll ’ ~ convexity of /(•), we have 

f{x*) - /(4) < (-/'(x*),4 - X*) = \\f{x*)\\d{xl,H*). (5.11) 

Next, we find an upper bound for ||a;^ ~ a;*||. Lemma [F3] states that x), lies in a 
ball with radius || ^/'(a;*)||, center z := x* — j^f'{x*), and has the point x* on its 
boundary. See Figure EH The furthest point x in this ball from x* that satishes 
d{x, H*) < d{xl., H*) has to be such that d{x, H*) = d{x)., H*) and x being on the 
boundary of this ball. 

Finding an upper bound for ||a;^ — a:*|| is now an easy exercise in trigonometry. 
Let 9 be the angle that the line through x and x* makes with dH*. We thus have 
Zxzx* = 29. So cos20 = ^ ■ Making use of cos20 = 1 — 2[sin0]^, 

we have _ 

iid{xl, H*) 

2||f (x*)|| ■ 


sin 0 = 
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Figure 5.1. Diagram used in the proof of Theorem 15.61 The 
distances c?i and ^2 equal ||^/'(a;*)|| and ||^/'(a:*)|| — 
respectively. 


An upper bound for ||a;J — x*\\ is thus d{xl, H*)/sinO, so 

\\xl-x*\\ < d{xl,H*)/sm9 = ,j2fi\\f'{x*)\\d{xl,H*). (5.12) 

We have 

d{x*k,H*) < d(xl,C) < \\xk - 411 +d{xk,C) < \\xk - 411 + Kd{xk,H^+^) = d. 

(5.13) 

To get (I5.1()a|) . we make use of (15.1111 . (15.91) and the assumption that f{-) is Lipschitz 
with constant M to get 

\f{xk)-f{x*)\ < \f{x*)-f{xl)\ + \f{xl)-f{Xk)\ 

< \\f{x*)\\d + M\\xk-xl\\. 

Formula (|5.10bl) follows easily from (15.121) . □ 


We now calculate the number of inner iterations needed for outer iterations 
j e {1,... , A: ~ 1} so that \f{xk) — /(a;*)| < e. As seen in Theorem 15.41 the 
convergence rate of \f{xk) — f{x*)\ is 0{l/k), or in other words, 0(l/e) outer 
iterations would ensure that \f{xk) — f{x*)\ < e. 

To ensure that ||xj—x*|| < ^ for j S {1,..., A:—1}, we need 0(log(j^)) iterations. 
Since the number of iterations k is 0(l/e), we need at least 0(log(l/e^)) iterations 
for the (k — l)th inner subproblem. We now proceed to find how the condition 
d{xk, > 2\\xk — x^W affects the number of inner iterations in each outer 

iteration. We have the following inequalities. 


Remark 5.7. If \\xk — < ^d{xl., C), then linear metric inequality, the fact that 

K > 1, and the triangular inequality implies 


d{xk,H^_^i) > 

> 

> 

> 


K 

id(4,C') - ^\\xk - 411 

3|kfe -411 - Ikfc -411 
2|kfe-411- 


Remark [FIT] implies that in the jth outer iteration, the number of inner iterations 
it takes to get d{xj, > 2\\xj — x*\\ is at most the number of iterations it takes 

to get \\xj - 411 < ^d{x*,C). 
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Proposition 5.8. We continue the discussion of this subsection. Suppose /(•) is 
Lipschitz with constant M. If d{xl.,C) < , and d{xk, > 2\\xk — 

xlW, then \f{xk) - f{x*)\ < e. 

Proof. We first prove that d{xk, > 2\\xk — x’lW implies \\xk — a;^|| < d{x\,C). 

We have 

d{xl,C) > d{xk,C) - \\xk - 411 > d{xk,H+^^) - \\xk - 411 > W^k - 411 - 

Finally, making use of Theorem 15.fif ll. we have 

\f{Xk)- f{x*)\ < \f{x*)-f{xl)\ + \f{xk)-f{xl)\ 

< \\f{x*)\\d{xl,C)+M\\xk-xl\\ 

< \\f{x*)\\d{xl,C)+Md{xl,C) 


So we must have d{x*,C) > for all j G {1,. .., fc — 1}. For the outer 

iterations j G 1}, Remark l5. 71 imposes that the number of inner iterations 

needs to allow us to get ||xj—4II ^ 3K[||/'(^'«)||+Af] • number of inner iterations 

for outer iterate j G {1, ..., fc — 1} needs to be at least 0(log(l/e)), which is less 
than the 0(log(l/e^)) obtained earlier. So the total number of inner iterations in 
outer iterations j G {l,...,fc — 1} that is needed to get \f{xk) — f{x*)\ < e is 
0(i log(l/e^)). The corresponding number of inner iterations to get \\xk — a;*|| < e 
can be similarly calculated to be 0{^ log(l/e‘*)). 

6. Lower bounds on effectiveness of projection algorithms 

In this section, we derive a lower bound that describes the absolute rate conver¬ 
gence of first order algorithms where one projects onto component sets to explore 
the feasible set. Let / : K" —>■ K be a convex function. When uni) is restricted 
to the case where fj{x) is an affine function and Q = M", we have the following 
problem 

min f{x) (6.1) 

s.t. X G 

X G K", 

where Hj are halfspaces. In the case where m and n are large, only first order 
algorithms are capable of handling the large size of the problems. So absolute 
bounds rather than asymptotic bounds are more appropriate for the analysis of the 
speed of convergence of the algorithms. Motivated by the analysis in |Nes04] . we 
consider the following algorithm. 

Algorithm 6.1. (Algorithm to analyze (16.11) ) Suppose in (16.11) . we have the fol- 
lowinq algorithm. Let xn be a starting iterate. 

01 Set So = 0. 

02 For iteration fc > 1 

03 Find ik G {1,..., m}\S'fc-i, and set Sk = 4-1 U {ik}- 

Of Find objective value fk of mui{f(x) : x G Dj^Sk^j}■ 

05 End for. 
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A lower bound on the absolute rate of convergence of Algorithm 16.II would give 
an absolute bound on how algorithms that explore the feasible set by projection 
can converge. 

We look in particular at the problem 

min ||ei-a;||P (6.2) 

s.t. ([ei + ecj+i], x) < 0 for j e {1 ,..., n — 1} 

X e R", 

where || • ||p is the usual p norm defined by ||x||^ = X)r=i elementary 

vectors with 1 on the ith component and 0 everywhere else. We also restrict p to 
be a positive even integer, so that the objective function is seen to be convex. 
First, we prove that the constraints satisfy the linear metric inequality. 


Proposition 6.2. (Linear metric inequality in (16.21) ) The sets in the constraints 
of (16.21) satisfy the linear metric inequality. 


Proof. The unit normals of each halfspace is-y=^[ei+eej+i] for each j G {l,...,n— 
1}. The distance from the origin to the convex hull of these unit normals is at least 
-y=p^. We can make use of the results in [Kru06j for example, which contain 
what we need. (In fact, much more than what we need.) In the notation of that 
paper, linear metric inequality follows from establishing i? > 0 given 77 > 0. The¬ 
orems I(i) and 2(ii) there give § = 0 and ry < respectively. These imply 




> -tH- 

— i+u 


> 0 . 


□ 


When Algorithm 16.11 is applied to (16.21) . the symmetry of the problem implies 
that we can take Sk = {1, ■ • ■, k}. We now calculate the objective value when k of 
the constraints in (|6.2I) are considered. 


Proposition 6.3. (Calculating fk) In (16.21) . let p he any positive even integer p. 
Let fk be the optimal value of (16.21) when only k of the n — 1 constraints are taken 
into account. We have 


fk = 


k6 

[i + {k9y/ip-i)Y-^’ 


where 0 = e 


Proof. The function x !->■ ||ei — x||^ is seen to be strictly convex, so there is a unique 
minimizer. Let x be the minimizer of the kth subproblem. The symmetry of the 
problem implies that the second to (fc -|- l)th component of x have the same value, 
say (3, and the (fc-|-2)th to nth component of x are zero. Moreover, all the inequality 
constraints are tight. Let the first component have the value a. We now see that 
fk equals the objective value of the following problem 


fk = niin(a,,3) 

s.t. 


(1 - af + k^P 

a + eP = 0. 


We have /3 = —^a. Let 9 = k9. We have 

fk = min(l — a)P + 9aP. 

a. 

The derivative of the above function with respect to a equals 

p{l — a)P~^ + 9paP~^. 
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Setting the above to zero gives us 



1 — a 


a 

a(l + 0i/p-i) 

a 


This gives us 


e 




1 

1 

1 + 01/p-i ■ 


fk = il-ar + 9aP 

\i + 0i/p-iy vi + 0i/p-iy 
0P/(P-1) + ~Q 
[i+0i/p-i]p 


[i+0i/p-i]p 1 

which is what we need. □ 

One easy thing to see is that as fc —>■ oo, we have /fc = 1. This also means that 
if we make n arbitrarily large, the objective value converges to 1. By the binomial 
theorem, we can calculate that the leading term of 1 — /^ is 

(p- l)(fc6»)(P-2)/(p-i) 

[l + (fc6i)i/(p-i)]P"^ ■ 

This leading term converges to zero at fc —>■ oo at the rate of ©(py^^), while the 
other terms converge to zero at a faster rate. 

Two conclusions can be made with the example presented in this section. 

• The case of p = 2 gives a convergence rate of O(^) for the objective value. 
This suggests that the methods presented for strongly convex objective 
functions in Section 3] are the best possible up to a constant, that the 
methods in Section O are close to the best possible. 

• The case of p being an arbitrarily large even number gives a convergence rate 
of 0( fci/(Li) )■ This suggests that if the objective function is not strongly 
convex, it would be more sensible to use the subgradient algorithm (Algo¬ 
rithm [SU to solve (EH) instead. 


7. Lower bounds on rate of Haugazeau’s algorithm 

In this section, we give two examples in separate subsections to show the behavior 
of Haugazeau’s algorithm. The first example shows the 0(l/k) convergence rate of 
the objective value in the case of the intersection of two halfspaces. This suggests 
that the convergence rate of 0(l/fc) is typical. The second example shows that 
Haugazeau’s algorithm converges arbitrarily slowly in a convex problem when the 
linear metric inequality is not satisfied. 

The lemma below will be used for both examples. 
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Lemma 7.1. (Lower bound of convergence of a sequence) Let p > 1. Suppose 
{a/c}^i is a strictly decreasing sequence of real numbers converging to zero, and 
there is some 7 > 0 such that ak+i ^ for all k. Then we can find a 

constant M 2 > 0 such that at- > — , ^ for all k > 1. 

Py/2ff(k+Mf) ■' - 


Proof. By Taylor’s Theorem on the function f{x) := (1 — x)^, we can choose M 2 
large enough so that 



2p{k + M2)_ 


> 1 - 


p + 1 

2p{k + M 2 ) 


for all fc > 0 . 


(7.1) 


We can increase M 2 if necessary so that 

( 1 ) (fc + M 2 + l)(fc + M 2 - ^) > (fc + M 2)2 for all fc > 0 , 


( 2 ) ai > — , ^ , and 

^ ' ~ P^2p7(l + M2)’ 

(3) the map a 1 —>■ a(l— 70 ^) is strictly increasing in the interval [0, 


We now show that a,- > 


implies ai+i > 


P^/2pJ(l+M2)‘' 

for a\\ k > 1 , 


P ^JPpffk+Mf) P\/2p7(fe+M2 + l) 

which would complete our proof. Now, making use of the fact that {ofc} is strictly 
decreasing and (3), we have 


ctk+i > afe(l - 7aD ^ 
Combining GH) and ( 1 ) gives 
[k + M 2 + 1) 1 — 


1 - 


Py/2pj{k + M 2 ) L 2p{k + M 2 )_ 

p+1 


> (/c + M 2 + 1 ) 


2p{k + M2)_ 

A rearrangement of the above inequality gives 

1 


1 - 


2p{k + M 2 ) _ 


OLk+i > OLk{l - jal) > 


P^j2p'){k + M 2 ) 


1 - 


1 


p{k + M 2 ) 


> 


>k + M 2 . 


1 


which is what we need. 


Pyj2jrj{k + M 2 + 1) ’ 
□ 


7.1. The case of two halfspaces. Let 0 S R be such that 0 < 6* < ■n/2. Consider 
the problem of projecting the point xq = (1,0) S onto H+ ni7_, where H+ and 
H- are halfspaces in defined by 

H± := {(u,u) G : ±v > it/tan0}. 

See Figure mi It is clear that PH+nH_(a^o) = (0,0). We let 0 := (0,0) to simplify 
notation. Haugazeau’s algorithm would be able to discover the two halfspaces in 
two steps and solve the problem by quadratic programming. But suppose that 
somehow we have an iterate xi that lies on the boundary of H+ that is close to 
0 . A similar situation arises in projecting a point onto the intersection of many 
halfspaces for example. An analysis of this modified problem gives us an indication 
of how Haugazeau’s algorithm can perform for larger problems. 

For our modified problem, the iterates Xi would lie on the boundary of either 
H+ or H-. For the iterate Xfc, let ak be the distance ||xfe — (0, 0)||. This is marked 
on Figure [ 73 ] The cosine rule gives us the following equations. 

Il^fc II — O^k “t“ 0^k-\-l ‘2.Cyk0^k+l COS 20 

||a;o-a:fc||^ = 0 ^ + 1 - 2 Q;fc cos 0 

||xo-Xfc+i||^ = Ofe+i + 1 - 2 afc+i cos 0 . 


(7.2a) 

(7.2b) 

(7.2c) 
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Figure 7.1. Illustration of example in Subsection 17.II 


Pythagoras’s theorem gives us ||a;fe — Xfc+i ||^ + ||a;o — = ||xo —Xfc+i|p. Together 

with the above equations, we have 

Ofc — 2afeafc+i cos20 — 2afe cos6l = —2afe_|_i cos 0 

afe+i[cos 0 — Q!fc cos26*] = —a\ + akCos9 


Ofe+l = <Tfc 


cos o — ak 
cos 9 — ak cos 29 

1 — cos 29 


( f ^k 


'cos 9 — ak cos 29 


Since {ak} is a strictly decreasing positive sequence which converges to zero, we 
have ak > afe(l — afeq) for all k large enough, where 7 = ^ 2 cose^ • Lemma ITTI 
the convergence of {\\xk — Ph+oH-( xo)||} to zero is at best 0{\/k). 

Let /fe = ||xo — Xfcll^. To see the rate of how fk converges to 1, we note from 
(I7.2bl) that I — fk = 2akCOs9 — a\. Then the convergence rate of fk to 1 is of 
0 (l/fc). 


7.2. The case of no linear metric inequality. Let p > 1 be some parameter. 
Consider the problem of projecting the point (1,0) G onto the intersection of 
the sets C+ fl C_, where 

C± = {{u,v) S : in > |w|*’}. 

The diagram for this problem is similar to that of the one in Subsection 17. II The 
linear metric inequality is not satisfied in this case. It is clear that the projection 
of (1,0) onto C+ n C- is (0,0). We try to show that the parameter p can be 
made arbitrarily large, so that the convergence of the iterates Xfc to 0 = (0,0) is 
arbitrarily slow. We let Xk = {uk,Vk)- 

Proposition 7.2. The iterates Xk satisfy 

Xk ^ int(C+) U int(C_). (7.3) 

Proof. This is easily seen to be true for fc = 1. We now prove that UM holds 
for all k by induction. Without loss of generality, suppose that for iterate Xk, 
its second coordinate Vk is positive. The next iterate Xk+i is the intersection of 
the line passing through Xk perpendicular to xq — Xk and a supporting hyperplane 
of C-. It is therefore clear that Xk+i ^ int(C_). We also see that Uk+i < Ui. 
From the convexity of (7+, if a point x = (u,v) is such that u > 0, it < rti and 
X fl int(C+), then ZxqxO > tt/2. Given that ZxgXkO > 7 r /2 and Xk f int(C+), we 
have Xk+i ^ int(C+) as well. □ 
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Next, we bound the rate of decrease of Uk- 


Proposition 7.3. Continuing the discussion in this subsection, we have Uk+i > 
Wfe I 1 — 3p-i 


Proof. We assume without loss of generality that Xk = {uk, Vk) is such that Vk > 0. 
By Proposition [Till we have Vk < Uk- 

Consider the point Xk+i defined by the intersection of the line through Xk per¬ 
pendicular to Xk — Xq and the line passing through 0 and (uk, —u^)- One can use 
geometrical arguments to see that Uk+i > Ufc+i, where Ufc+i is the first coordinate 
of Xk+i and Uk+i is the first coordinate of Xk+i- We now bound Uk+i from below. 

The point Xk+i is of the form \(uk, —Uk)- From [xk — xq] -L [xk — Xk+i]-, we have 

{(uk-l,u\-Q),(uk-\uk,u\ + \u\)) = 0 

Aufe(l - Ufe)-I--I-(ufe - l)ufe-I-= 0 

A(1 -Uk+ 'u^k~^) = 1 - Ufe - u^k~^- 


This gives 


Uk+l > Uk+l 


Xuk = Uk 


1 2p-l 

1- Uk - Uj[ 

1 - Ufc -I- 


Uk\l- 


2u 


2p-l 


1 I 2p-l 

1 - Mfe -I- Uu 


which ends our proof. 


□ 


We now make an estimate of how \\xk — xo||^ converges to the optimal objective 
value of 1 by analyzing 1 — ||xfc — xop. We have 

(1 - UkY < \\xk - xoll^ < (1 - 
^ 1 - (1 - Ukf - < 1 - ||Xfc - XolP < 1 - (1 - Ukf 

^ 2uk -ul- ul^ < 1 - \\xk - xoll^ < 2uk - ul- 

This means that {1— ||xfc —xo|p} converges to zero at the same rate {2uk} converges 
to zero. By Lemma [7.11 and Proposition 17.31 the convergence of {uk} to zero is 
seen to be at best ( 2 p-\^ )- This means that as we make p arbitrarily large, the 
convergence of Haugazeau’s algorithm can be arbitrarily slow in the absence of the 
linear metric inequality. It appears that enforcing the condition 

C+ nC- C {x : {xq — Xk,x — Xk) < 0} 

makes Haugazeau’s algorithm perform slower than the subgradient algorithm. 
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